Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
尽管在训练有素的语言模型方面取得了进展,但缺乏用于预训练的句子表示的统一框架。因此,它要求针对特定方案采用不同的预训练方法,并且预培训的模型可能受到其普遍性和表示质量的限制。在这项工作中,我们扩展了最近提出的MAE风格的预训练策略RELOMAE,以便可以有效地支持各种句子表示任务。扩展的框架由两个阶段组成,在整个过程中进行了逆转录。第一阶段对通用语料库进行了逆转,例如Wikipedia,BookCorpus等,从中可以从中学习基本模型。第二阶段发生在特定于领域的数据上,例如Marco和NLI,在该数据中,基本模型是基于逆转和对比度学习的。这两个阶段的训练前输出可能会服务于不同的应用,其有效性通过全面的实验验证。具体来说,基本模型被证明对零射击检索有效,并且在贝尔基准上取得了显着的性能。继续进行预训练的模型进一步受益于更多的下游任务,包括针对马可女士的特定领域的密集检索,自然问题以及句子嵌入式标准STS的质量和延性端的转移任务。这项工作的经验见解可能会激发预训练的句子代表的未来设计。我们的预培训模型和源代码将发布给公共社区。
translated by 谷歌翻译
语言模型既展示了定量的改进,又展示了新的定性功能,随着规模的增加。尽管它们具有潜在的变革性影响,但这些新能力的特征却很差。为了为未来的研究提供信息,为破坏性的新模型能力做准备,并改善社会有害的效果,至关重要的是,我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战,我们介绍了超越模仿游戏基准(Big Bench)。 Big Bench目前由204个任务组成,由132家机构的442位作者贡献。任务主题是多样的,从语言学,儿童发展,数学,常识性推理,生物学,物理学,社会偏见,软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号,Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为,跨越了数百万到数十亿个参数。此外,一个人类专家评估者团队执行了所有任务,以提供强大的基准。研究结果包括:模型性能和校准都随规模改善,但绝对的术语(以及与评估者的性能相比);在模型类中的性能非常相似,尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分,而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标;社交偏见通常会随着含糊不清的环境而随着规模而增加,但这可以通过提示来改善。
translated by 谷歌翻译
由于知识图(kgs)的不完整,旨在预测kgs中未观察到的关系的零照片链接预测(ZSLP)引起了研究人员的最新兴趣。一个常见的解决方案是将关系的文本特征(例如表面名称或文本描述)用作辅助信息,以弥合所见关系和看不见的关系之间的差距。当前方法学习文本中每个单词令牌的嵌入。这些方法缺乏稳健性,因为它们遭受了量不足(OOV)的问题。同时,建立在字符n-grams上的模型具有为OOV单词生成表达式表示的能力。因此,在本文中,我们提出了一个为零链接预测(HNZSLP)的层次N-gram框架,该框架考虑了ZSLP的关系n-gram之间的依赖项。我们的方法通过首先在表面名称上构造层次n-gram图来进行起作用,以模拟导致表面名称的N-gram的组织结构。然后,将基于变压器的革兰amtransformer呈现,以建模层次n-gram图,以构建ZSLP的关系嵌入。实验结果表明,提出的HNZSLP在两个ZSLP数据集上实现了最先进的性能。
translated by 谷歌翻译
从大规模训练数据集中获利,神经结构设计和高效推断的进步,联合嵌入成为解决交叉模态检索的主导方法。在这项工作中,我们首先表明,尽管他们有效性,但最先进的联合嵌入从长期的封闭问题中遭受显着遭受显着的困扰,其中少数画廊嵌入形成了许多查询的最近邻居。从NLP文献中汲取灵感,我们制定了一个称为QueryBank归一化(QB-Norm)的简单但有效的框架,该框架重新归属查询相似度,以解释嵌入空间中的集线器。 qb-norm提高了检索性能而不需要再培训。与事先工作不同,我们显示QB-​​Norm有效地工作,而不会对任何测试设置查询进行操作。在QB-Norm框架内,我们还提出了一种新颖的相似性归一化方法,动态倒置Softmax,比现有方法明显更强大。我们在一系列交叉模态检索模型和基准中展示了QB-Norm,在那里它一直增强超出现有技术的强基线。代码可在https://vladbogo.github.io/qb-norm/处获得。
translated by 谷歌翻译
跨不同层的特征的聚合信息是密集预测模型的基本操作。尽管表现力有限,但功能级联占主导地位聚合运营的选择。在本文中,我们引入了细分特征聚合(AFA),以融合不同的网络层,具有更具表现力的非线性操作。 AFA利用空间和渠道注意,以计算层激活的加权平均值。灵感来自神经体积渲染,我们将AFA扩展到规模空间渲染(SSR),以执行多尺度预测的后期融合。 AFA适用于各种现有网络设计。我们的实验表明了对挑战性的语义细分基准,包括城市景观,BDD100K和Mapillary Vistas的一致而显着的改进,可忽略不计的计算和参数开销。特别是,AFA改善了深层聚集(DLA)模型在城市景观上的近6%Miou的性能。我们的实验分析表明,AFA学会逐步改进分割地图并改善边界细节,导致新的最先进结果对BSDS500和NYUDV2上的边界检测基准。在http://vis.xyz/pub/dla-afa上提供代码和视频资源。
translated by 谷歌翻译
Convolutional neural network-based approaches for semantic segmentation rely on supervision with pixel-level ground truth, but may not generalize well to unseen image domains. As the labeling process is tedious and labor intensive, developing algorithms that can adapt source ground truth labels to the target domain is of great interest. In this paper, we propose an adversarial learning method for domain adaptation in the context of semantic segmentation. Considering semantic segmentations as structured outputs that contain spatial similarities between the source and target domains, we adopt adversarial learning in the output space. To further enhance the adapted model, we construct a multi-level adversarial network to effectively perform output space domain adaptation at different feature levels. Extensive experiments and ablation study are conducted under various domain adaptation settings, including synthetic-to-real and cross-city scenarios. We show that the proposed method performs favorably against the stateof-the-art methods in terms of accuracy and visual quality.
translated by 谷歌翻译
Recent advances in deep learning have enabled us to address the curse of dimensionality (COD) by solving problems in higher dimensions. A subset of such approaches of addressing the COD has led us to solving high-dimensional PDEs. This has resulted in opening doors to solving a variety of real-world problems ranging from mathematical finance to stochastic control for industrial applications. Although feasible, these deep learning methods are still constrained by training time and memory. Tackling these shortcomings, Tensor Neural Networks (TNN) demonstrate that they can provide significant parameter savings while attaining the same accuracy as compared to the classical Dense Neural Network (DNN). In addition, we also show how TNN can be trained faster than DNN for the same accuracy. Besides TNN, we also introduce Tensor Network Initializer (TNN Init), a weight initialization scheme that leads to faster convergence with smaller variance for an equivalent parameter count as compared to a DNN. We benchmark TNN and TNN Init by applying them to solve the parabolic PDE associated with the Heston model, which is widely used in financial pricing theory.
translated by 谷歌翻译
Managing novelty in perception-based human activity recognition (HAR) is critical in realistic settings to improve task performance over time and ensure solution generalization outside of prior seen samples. Novelty manifests in HAR as unseen samples, activities, objects, environments, and sensor changes, among other ways. Novelty may be task-relevant, such as a new class or new features, or task-irrelevant resulting in nuisance novelty, such as never before seen noise, blur, or distorted video recordings. To perform HAR optimally, algorithmic solutions must be tolerant to nuisance novelty, and learn over time in the face of novelty. This paper 1) formalizes the definition of novelty in HAR building upon the prior definition of novelty in classification tasks, 2) proposes an incremental open world learning (OWL) protocol and applies it to the Kinetics datasets to generate a new benchmark KOWL-718, 3) analyzes the performance of current state-of-the-art HAR models when novelty is introduced over time, 4) provides a containerized and packaged pipeline for reproducing the OWL protocol and for modifying for any future updates to Kinetics. The experimental analysis includes an ablation study of how the different models perform under various conditions as annotated by Kinetics-AVA. The protocol as an algorithm for reproducing experiments using the KOWL-718 benchmark will be publicly released with code and containers at https://github.com/prijatelj/human-activity-recognition-in-an-open-world. The code may be used to analyze different annotations and subsets of the Kinetics datasets in an incremental open world fashion, as well as be extended as further updates to Kinetics are released.
translated by 谷歌翻译
Quantum computing (QC) promises significant advantages on certain hard computational tasks over classical computers. However, current quantum hardware, also known as noisy intermediate-scale quantum computers (NISQ), are still unable to carry out computations faithfully mainly because of the lack of quantum error correction (QEC) capability. A significant amount of theoretical studies have provided various types of QEC codes; one of the notable topological codes is the surface code, and its features, such as the requirement of only nearest-neighboring two-qubit control gates and a large error threshold, make it a leading candidate for scalable quantum computation. Recent developments of machine learning (ML)-based techniques especially the reinforcement learning (RL) methods have been applied to the decoding problem and have already made certain progress. Nevertheless, the device noise pattern may change over time, making trained decoder models ineffective. In this paper, we propose a continual reinforcement learning method to address these decoding challenges. Specifically, we implement double deep Q-learning with probabilistic policy reuse (DDQN-PPR) model to learn surface code decoding strategies for quantum environments with varying noise patterns. Through numerical simulations, we show that the proposed DDQN-PPR model can significantly reduce the computational complexity. Moreover, increasing the number of trained policies can further improve the agent's performance. Our results open a way to build more capable RL agents which can leverage previously gained knowledge to tackle QEC challenges.
translated by 谷歌翻译